Example subsampling (loosely called "bagging")

Example subsampling is a technique typically used with sequential ensembles. For parallel ensembles, we typically use bootstrap sampling.

As with mini-batches for gradient descent, subsampling (sampling without replacement) implies that each base learner is seeing a different set of observations. Since the observations have divergent responses to the same parameters, smaller samples result in the model “jumping around” more with each additional base learner. However, a small learning rate coupled to a small sample size can have a regularizing effect, just as stochastic gradient descent tends to learn a more generalizable fit than gradient descent without sampling.

It’s not really bagging (but we call it that)

Bootstrap implies sampling with replacement. Neither XGBoost nor LightGBM sample with replacement in the context of the same tree. Therefore example subsampling is not actually bootstrap aggregation (“bagging”).

Nevertheless, LightGBM calls it that, as do many practitioners. Why? This is because we are selecting a new (sub)sample for each base learner, and we use all of these learners for a single ensemble learner. If you squint, you can see how the end result resembles that of a bagging ensemble.

That said, the process is very different, and I think it’s safe to say that this terminology introduces some regrettable confusion and ambiguity.

David's raw ML reference notes

Explorer

Example subsampling (loosely called "bagging")

It’s not really bagging (but we call it that)

Graph View

Backlinks